Legal Institutions as AI Governance Infrastructure

Work in Progress: This article is currently a work in progress. We are actively conducting mechanistic interpretability research on legal concepts in open-source language models and developing a risk model for AI-use in the judiciary.

About the Project

This research project focuses on the intersection of AI safety, mechanistic interpretability, and legal jurisprudence. Specifically, we are exploring:

  • Mechanistic Interpretability: Probing activation spaces of open-source language models to identify if legal concepts (such as liability, breach, or jurisdiction) are represented linearly.
  • Risk Modeling: Mapping systemic risks (like sycophancy, lack of true reasoning robustness, and context manipulation/prompt red-teaming) and existential risks of utilizing AI as a substitute for human judicial decision-making.
  • Jurisprudential Analysis: Grounding these empirical AI findings in legal theory (e.g., Dworkin's theories of legitimacy) to define the value of judicial institutions as AI governance and alignment infrastructure.

We will share updates, code, and findings here as the project develops.

Published from Brisbane, Australia.